Goto

Collaborating Authors

 softmax function


Deep Neural Nets with Interpolating Function as Output Activation

Neural Information Processing Systems

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function. And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning. The new framework demonstrates the following major advantages: First, it is better applicable to the case with insufficient training data. Second, it significantly improves the generalization accuracy on a wide variety of networks. The algorithm is implemented in PyTorch, and the code is available at https://github.com/






GeneralizableMulti-LinearAttentionNetwork

Neural Information Processing Systems

The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture themultimodal joint representation. Bilinear attention network (BAN) isacommonly used integration method, which leverages tensor operations to associate thefeatures ofdifferent modalities.




As stated in Section A, we apply the softmax function such thatRAPsoftmax outputs a synthetic datasetdrawnfromsomeprobabilisticfamilyofdistributionsD = n σ(M)| M Rn

Neural Information Processing Systems

Pt i=1eqi(x)(eai eqi(Di 1)) which is the exactly the distribution computed byMWEM. D(x)log(D(x)) (6) The optimization problem becomesDt = argminD (X)Lmwem(D, eQt, eAt). We show the exact details ofGEM in Algorithms 2 and 3. Note that given a vector of queries Qt = hq1,...,qti,wedefinefQt() = hfq1(),...,fqt()i. B.1 Lossfunction(fork-waymarginals)anddistributionalfamily For anyz R,G(z)outputs a distribution over each attribute, which we can use to calculate the answer toaquery viafq. Empirically,wefindthatour model tends to better capture the distribution of the overall private dataset in this way (Figure 3).


Adaptive Sampling for Efficient Softmax Approximation

Neural Information Processing Systems

The softmax function is ubiquitous in machine learning and optimization applications. Computing the full softmax evaluation of a matrix-vector product can be computationally expensive in high-dimensional settings. In many applications, however, it is sufficient to calculate only the top few outputs of the softmax function. In this work, we present an algorithm, dubbed AdaptiveSoftmax, that adaptively computes the top k softmax values more efficiently than the full softmax computation, with probabilistic guarantees. We demonstrate the sample efficiency improvements afforded by AdaptiveSoftmax on real and synthetic data to corroborate our theoretical results.